r 



dma 



-13- 



fly 
I 



r 
11* 



cpu 



switch fabric 



ethernet 
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How Rings assign address space 

Stepl: aHgiirromii^g address to self (to some power of 2) 
Step2: asagithe re silt to self address 

Step3: next_add = self_addr + self_addr_space; // number of register usedlocally 
Step4: send dowrinextaddr 



Example: 

Dma needs 16addis 
Uart needs 4 
Hirer needs 256 



Emrrexate rress 
Addr=3 



self=32 



Add* =36 self =256 




Addr=512 
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9 



clkl 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 




compound A compound 



clock runs with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring- interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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7* 



compound " 



data_a 



.11 



compound 13 



clock opposes data 

this arrangement has the advantage 
of auto ensuring the no race 
^condition (at least in this simple 



data_b case ) exists 



1* 




data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 




compound A 
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data (\% 



clock 



clock 





clock 



ncertainty range 



data_a leaving the bridge goes to member 4 'b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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'OZ 



data 





ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
clock member we can take care to delay the datas 

beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. It will need a latch in "b". 



big module 



lit 



F'9 



local_clock ' 



no 



data_in 



data from 
previous member 

elk 



A 



local_data_out 



ring interface 



-^»-data 



clock 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b*\ ring works 
fine. However if most of the traffic is from "c" to "b*\ 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that » "a" 
transmits mostly to "d*' and "b" mostly to "c" In this case 
/ Tr*/ communication between "b** and w c" suffers degradation 
in case that peak traffic coincide. 




I** 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl. By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to Dl is treated directly. 




Enumeration is started by "Anchor" 
which assigns address=l to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 
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msgl and msg2 arrive at the same time, 
the bridge end must make a decision 
which message to forward first. 

It can be shown that unwise decision can 
lead to freezout, deadlock and option price 
dropping to 5$. 

Therefore MSG2 gets the priority. 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 



( Anqhor 




d2 



12 




11 far 



%oe> 



In land bridge ring, the situation is trickier. If V2 
send message to address=5. The land bridge 
divert at 1 1/far end. it will re- appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 



/I 



APP ID= 10064337 



Page 240 of 281 



:IL O O £»■> f 3 3 ">' - O " O td 4J £■ 





APP ID=10064337 



Page 241 of 281 




APP ID=10064337 



Page 242 of 281 





? typs> 




berA 


2JE) addr^^ 




mem 


64 data^ITT^ 


|UI9UI 




ok 






scan test 






reset r 

















elk 



type 



ok 



idle 



/ msgA 



X ms g B 



"\idl e 



during the first clock, OK remains active, when type 
is of msgA. It means lhat on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok omsg ook 
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The incoming messages are examined first 19 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during 
enumeration protocol. The lower part of this 
register is always masked,. Hopefully 
synthesis will delete the unused bits 
implementation. 




comparator 



3 



ours/through 



incoming 
address 

address 
split mask 



dont care part 
of self-address 



\ 



part of the address 
that enters the member 
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Member 



RifJL* Rif_* Rif_o_* moduleJd 



RIF 



Address Space = 7 



Activation register 



30^ 
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3. 3^ 
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Member 



Rir_I_* Rif_* Rif_o_* moduIe id 



RIF 



Address Space = 7 



Activation register 



i 

Fig. 33 30to 
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the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



31* 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

35? 



Internal Memory 55?" 

Fast, Unified, Multi-port 



S3 %bla 

Network Processor 



Peripheral Expansion 
Enet, ATM. Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 



S5o 
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^1 



Internal Memory 

w 



rclcii Uml 

FTU 



1 ^udSUirr 

H LSU 
H 



P«*gram Sequencer 

PSU 



PBU 



Register File 



Vobla Core 



Arithmetic 

DALU 



31 0 



i. i . i. .. E i . i . g i. 



Agcni VF 

AG1 I— 3*2- 



Agcni Bus 



Owe Debug 



DMA Agent 



Vobla Compound* 



Fi 



3 
si 
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32 register:* of 
32 bits per ta&k^ 

A ad of lridhcatiims 
per task, which 
control tasJk execution ' 

iclwriiullTU- 

A a interface to 
adjacent resource* 

Fast memory accessed 
by ioad/rforc 
instruct kins 



purpose 
registers 



Special 
purpose 
registers 



Agent 
interface 



LMWlt I £U ration 

gtsxers 



Internal 
memory 



External 
memwy 



per task ami 
global register* 



Initialised by 
the PP 



Big area atcevwd 
via a DMA interface 



to 
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Rl register: 



/=7« • ¥ / 




IEBEE 



s - stm ky bit 

eq - equal/zero 

tt - k*ss tfoeivriegative 

gt - greater then/positive 

c - carry 

mb - rcflcctiun of the RAM multi -reader busy indication. 



, *22223222;2lllll 1 I l 1 1 * * 7 ft S 4 \ 2 
lQQ*7bS4i2 I D * S * ft J 4 1 1 I & 



HEi'fiTCH SPR 



TASK SPR 



NI-XT RHl.ini 



LKXJH 

ivki.l 

Rl-O 



MINftFJC 5TR 
x-3) 
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Frame structure 
of an example 
task type 



V3 



r27. 



common 
task data 


task 

fragment 1 
data 






task fragment 
2 data 


task 

fragment 3 
data 


data of all levcl2 functions 


level I fl 
data 


level 1 Q 
data 


level 1 
data 







I sizcoflcvclO 
'. frame part is 
J dilTcrcntfor 
• each task 

; type 

4 sizcoflcvel2 
! frame part is 
+ constant 

si2coflcvcll 
frame part is 
different per 
each task 
l>pe 
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DECODE LOGIC 
DECODE Ft 



ADDRESS ALU 



DAB 



FETCH DECODE ADDRESS EXECUTE WRITE 



addrrsj£> 





from men ory 



^adJr calc^> 

< $xjata addres s^ i p 



1 < $joaded datj u> 

< $^tored datg u> ' 
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, snoop 



input 




request 


message 




fifo 


decoder 








type in 
Jinter face^addressf 15.0 
l k _inlerfacc_data[3 1 .0] 



type out[7:0](si2etuL'RC ) 

addr!ss_out[23:0] 

data_out[63:0] 
(alsnjp C RC) 

ok2drivc 



memory data 



message data 




agent command 


opcode 

ir) 


options[5:0] 


RA [ 


AID[5:0] 


RB/imm8 


(A ID* multiread i 







in mult (reader 



increment fi^, inoop , , 
dentinal ion , ^ v . ' . 
address («p2)(opl) (opO) 

lop3> 



source address[23:0] destinatiojaddrcss[23:0] 



count[7:0] 



sourcc_addr_in_srain address of destenation number of bvtcs 

to transfer " 



Ft a. *o 



6^ 
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Vobla 



agent interfact 







main 
entry 


















output 










i 


— ^ 


message 
encoder 


CL 




shadow 




outpi 








entry 






i 






A 






g reset 

^ elk 

ok2drive 



agent command 



opcode 



<AlD-messaga_ae rider) 
(optfon[6] =0) 
me**ago_sender3; 



options[9:0] RA 



AJD[4.0| RB/imm8 



(option [6] «1) 



^0,options[5:0] j raw_data[3 1 :0] 


rawaddrcssf 23 :0] 


t>pcf7:0] 


message data address 


_of_dc-stcna lion 


message type 


1 opcode 


options[Q:0] 


RA 


A!D[4:0] 


RB 






jj{ I ,opuons{5:0] j raw_data[63:0] 


raw_addrcss(23.0] 


10000000 | 



message data 



address of destination message type 



6Z 



doorbell 



Thai set maskjl T 
diisetmask I 
dffsetrnasT ~2 



dn*set mask 




token 
control 



Jd[5:0] 



request 
entries 
X2 



DMA 
context table 



if 



input 

message 
decoder 



output 

message 
encoder 



t rypc_in [7:01 
v riic^intcftace addressl23 :0] 

wn5 _imerface_daui[3 1 .0] 
riT^ base" addrcss[7:0] 

type out[7:0] 
address out[23 .0] 
ata^mt[63:0] 



'okidri 

_rcsei_ 
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agent command 
(AIDsdma agent] 



dma request) M 
entry 



(OP9> 



urgent dir autcBcnd lone 
fs (OP2) (OP I) set ack address 



«>P0) (ON) (OP 10) 



t addrc5s[23:0rj|{cxiuntI7:0]) 

_ number of bytes 
to transfer or 
address modifycr 



sram address 



last in transfer 



snoop 



calculate^ rc 



trans c 
.frag 3 



register 




input 


file 




buffer 



. c 




Ft 3 • 

55" 




agent command I opcode 
(AID=CRC agent) = 



CRC type data size generateoperation overwrite 
check mode readue 
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Vobla 



igent interfs ce 



'1> 



C control 
register ^ 



counter 



I 



pre scaleF 4- 



time stamp 
register 



div 

by 2 



reset 



agent command 



tps[9:0] 



opcode 


options[9:0l 


RA 


AID 


RB/iium8 



St 
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j agent interface F 



v_set_doorbcH nr ask 
v next task vain 



Vobla 



v currcnt task ic 



v dooi 



[5:0] 
valid 
0] 

vbJA _rcg[2fi ] 

2:0] 



v nex t task id [5 



loorbell mask 



I 



input 

message 
decoder 



task mask 
registerfile 



1:0] 



task request; 
register file 
and counter 



TGMR 



02 



mask control 




priority logic 


logic 







rif_i_ wri te 
Snf fdoo rbe lies 



jrifi addr(5:0J 
InTT datar4:01 
Jm set maskO 



%n set maskl 



"dm set mask3 




! rif i re set 
rit ,.?..e<oek 



agent com man 
(AIDsdoorbell 



tj" 



>pcode 



options[9:0] 



RA 



AID[4:0] 



RB/imm8 



scVclear clear clear 
global request mask 
(OP2) (OP I) (OPO) 



{0,0.O,0AmaskJm_indcx[2:0] J write mask 
or 

{0,0,0,0, 1, req bit index [2:0]} write request 
or 

{ l,0,0,0.0,count_value[2:0] J write counter 
or 

{0, ! .0,0,0,0,0.0 J write TGMR 
or 

{1,1 .0,0 A0,0 T urgcnr_value } write urgent 



yo 
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interface 
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,FIFO_WR 
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RD FIFO 
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Hos.1 W2 (DMA) 
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Write request 
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u 



Read request 
generator 



WR FIFO used 
words counter 
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Host #3 (Ring extender) 
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Encryption 
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DSP 



PCI 



Memory controller 
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Trajan memory 
interface 



Trajan FIFO_RD FIFO_Rp 
input 



Trajan jplFO_WR FIFO_WR 
input 



DPR 



HI! 

Host address 
generator 



Ztt: 

Target address 
generator 



Interface to 
external 
device 
(device 
specific) 



Synchronizer 



Synchronizer 



Synchronizer 



Interrupt 



FPGA 



APP_ID=10064337 



Page 267 of 281 



,jl O U £» 3 7 „. O t) cHle* 



r ing in . 




free 



setup registers: 



doorbell, taskid and viscode 
urgent viscode 
header len and address 
threshold to urgent 



doorbells 



(ring c jontrol 



ring c 



last | size | 



rx status word 



APP ID=10064337 



Page 268 of 281 



IOt)tV*ii / . O • " O k£ .£J s~ 



mil 



setup registers: 



mac 



r ing in 



fifo 
ram 



setu 3 



doorbell, taskid and viscode 
urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



doorbell 

free entry count 

finished frames coui 



(ring ^ o 



ntrol 




ring c 
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Internal Memory 



_ P rogram 




Pijelaad/Byaip Bump Regterar.Bfc 
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Vobla Core 



-Address 



nn, a Agent Bus 
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Instruction j 


Instruction j 


Address i 


Read source 1 


Write result 


fetch request j 


decode 


calculation. 1 


Registers j 


into destination 






Data access req; 


Data execution) 


register 
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> Flip Flop 
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AALb, jOgg AAL1, AALW 



Protocol specific 

data path 
functional blocks 




Generic data path 

processing 
functional blocks 



EvBR totfQing 
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service 
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